Add Word-guess NemoGym GRPO training by yyu22 · Pull Request #1903 · NVIDIA-NeMo/RL

yyu22 · 2026-02-09T23:48:28Z

Add NemoGym Wordle GRPO training config (grpo_wordle_nemotron_nano_v2_9b.yaml)
Add Nemotron JSON tool call parser (nemotron_json_tool_parser.py)
Fix _replace_prefix_tokens and _postprocess_nemo_gym_to_nemo_rl_result crashes when chat templates strip reasoning tokens from prior assistant messages

Dependencies

Requires Gym PR: Add word-guessing game environment Gym#666 (Wordle resource server)

Bug Fix: Token alignment with reasoning-stripping chat templates

Models like Nemotron Nano 9B v2 have chat templates that strip ... from prior assistant messages when re-rendering for subsequent turns. This causes two assertion failures during NemoGym multi-turn
tool-calling training:

_replace_prefix_tokens (vllm_worker_async.py) — template drops the last assistant message when content is empty after stripping, causing len(template_token_ids) <= len(template_prefix_token_ids) or missing EOS
_postprocess_nemo_gym_to_nemo_rl_result (nemo_gym.py) — token contiguity check fails because generation_token_ids includes thinking tokens but re-tokenized prompts don't

Fix: Fall back to template token IDs when alignment fails instead of crashing. Note: this causes token duplication in affected samples, which may slightly impact training quality. The proper fix would be to strip
thinking tokens from generation_token_ids before recording, matching what the template does on re-render.

Training Setup

Generate Wordle data (in Gym repo)

  cd 3rdparty/Gym-workspace/Gym/resources_servers/wordle
  python generate_data.py --output_dir data/

Run training

  uv run python examples/nemo_gym/run_grpo_nemo_gym.py \
      --config examples/nemo_gym/grpo_wordle_nemotron_nano_v2_9b.yaml

Signed-off-by: root <yayu@nvidia.com>

… for reasoning models - Add grpo_wordle_nemotron_nano_v2_9b.yaml config for NemoGym Wordle training - Fix _replace_prefix_tokens crash when chat templates strip reasoning tokens from prior assistant messages (e.g., Nemotron's <think>...</think> stripping) - Fix _postprocess_nemo_gym_to_nemo_rl_result contiguity assertion for the same reasoning token stripping issue Signed-off-by: root <yayu@nvidia.com>

cmunley1 · 2026-02-10T00:01:32Z

related: #1812

id like to disable forcing token-level on policy eg for agents with context mgmt, but feels like we shouldnt just quietly fallback to off policy, it should be a cfg option at least

i think rather than disabling asserts for replace prefix tokens we should just do this for now https://docs.nvidia.com/nemo/gym/latest/tutorials/nemo-rl-grpo/single-node-training.html#configure-the-chat-template

until we test disabling this thorough and add a cfg

Revert workaround changes to nemo_gym.py and vllm_worker_async.py. Instead, add a custom Nemotron chat template that preserves <think> tokens in prior assistant messages (no stripping), which keeps token alignment consistent across turns for _replace_prefix_tokens. Signed-off-by: root <yayu@nvidia.com>

Signed-off-by: root <yayu@nvidia.com>

yyu22 added 2 commits February 9, 2026 09:26

feat: Add Wordle game environment for GRPO training

7a3ea11

Signed-off-by: root <yayu@nvidia.com>

yyu22 changed the title ~~Add Wordle NemoGym GRPO training~~ Add Word-guess NemoGym GRPO training Feb 9, 2026

yyu22 added 4 commits February 10, 2026 03:03

chore: Remove unused files and revert workplace assistant config

44c3d84

Signed-off-by: root <yayu@nvidia.com>

fix: Update wordle config - TP=2, seq_len=2048, disable thinking

18b8263

Signed-off-by: root <yayu@nvidia.com>

fix: Update config - TP=1, larger batch sizes, 8 GPUs, max_steps=12

e4881f8

Signed-off-by: root <yayu@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Word-guess NemoGym GRPO training #1903

Add Word-guess NemoGym GRPO training #1903
yyu22 wants to merge 6 commits intoNVIDIA-NeMo:mainfrom
yyu22:feat/wordle-environment

yyu22 commented Feb 9, 2026

Uh oh!

cmunley1 commented Feb 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yyu22 commented Feb 9, 2026

Uh oh!

cmunley1 commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cmunley1 commented Feb 10, 2026 •

edited

Loading